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ABSTRACT 


Generalized  hyperexponential  (GH)  distributions  are  linear 

combinations  of  exponential  CDFs  with  mixing  parameters  (positive  and 

negative)  that  sum  to  unity.  The  denseness  of  the  class  GH  with  respect 

to  the  class  of  all  CDFs  defined  on  [0,«>)  is  established  by  showing  that 

a  GH  distribution  can  be  found  that  is  as  close  as  desired,  with  respect 

to  a  suitably  defined  metric,  to  a  given  CDF.  The  metric  induces  the 

usual  topology  of  weak  convergence  so  that,  equivalently,  there  exists  a 

sequence  of  GH  CDFs  that  converges  weakly  to  any  CDF.  The  result 

follows  from  a  similar  well-known  result  for  weak  convergence  of  Erlang 

mixtures.  Various  set  inclusion  relations  are  also  obtained  relating 

the  GH  distributions  to  other  commonly  used  classes  of  approximating 

o 

distributions  including  generalized  Erlang  ^  mixed  generalized 

Erlang  iflGE) ,  those  with  reciprocal  polynomial  Laplace  transforms 

Q- 

those  with  rational  Laplace  transforms  ,  and  phase-type 

distributions.  A  brief  survey  of  the  history  and  use  of  appro.ximating 
distributions  in  queueing  theory  is  also  included.  , 


Key  phrases;  probability  distribution;  cumulative  distribution 

function;  approximation;  convergence  in  distribution; 
weak  convergence;  denseness;  Erlang  distribution; 
generalized  hyperexpouent i a  1  distribution;  method  of 
stages . 
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1. 


INTRODUCTION 


The  purpose  of  this  paper  is  to  characterize  the  class  of 
generalized  hyperexponential  (GH)  probability  distribution  functions  and 
to  justify  their  use  as  convenient  approximations  to  arbitrary  CDFs. 

1 . 1  Definition 

Generalized  hyperexponential  distribution  functions  are  of  the  form 


n  -X  .  t 

F(t)  =  I  a.(l-e  ^  ) 
i=l  ^ 

n 

with  I  a.  =  1,  a.  real,  X.  >  0.  They  are  generalizations  of  the 
i=l 

well-known  hyperexponential  distributions  which  are  of  the  same  form  but 
with  the  additional  requirement  that  the  coefficients  {a^}  be  positive. 
The  familiar  generalized  Erlang  CDFs  arising  as  the  distributions  of  a 
sum  of  independent,  non-identical  exponential  random  variables  are  in 
GH.  A  typical  example  is  provided  by  the  CDF 


F(t)  =  3(l-c‘*^)  -  SCl-e'^*")  +  (l-e‘^^) 


-  1  Q  .  o  -2t  -3t 

—  1  -  3e  +  3e  -  e 


1 . 2  Organization 

In  the  following,  we  first  discuss  briefly  the  evolution  of 
appro-ximations  to  CDFs  in  stochastic  modeling,  particularly  in  the  field 
of  queueing  theory.  Relationships  among  the  classes  of  approximating 
distributions,  inclviding  GH,  are  then  developed  in  Section  2.  Section  3 
establishes  that  any  CDF  can  be  approximated  as  closely  as  desired,  with 


respect  to  a  suitably  defined  metric,  by  a  GH  distribution.  This  fact, 
together  with  the  attractive  numerical  and  statistical  properties  of  the 
class  GH,  provides  a  major  justification  for  considering  this  class  of 
approximants .  Finally,  Section  4  contains  concluding  remarks  and  some 
areas  for  future  research. 

1 . 3  Background 

The  use  of  appro.ximating  distributions  in  applied  probability 
modeling  dates  back  at  least  to  the  early  part  of  the  twentieth  century. 
A.  K.  Erlang  used  the  so-called  method  of  stages  to  preserve  the  useful 
properties  of  exponential  distribution  functions  in  situations  where  the 
true  underlying  distributions  were  not  in  fact  exponential  (see,  for 
example,  Cox  and  Miller  [1970]).  By  imagining  customers  in  a  queueing 
situation  to  progress  through  a  series  of  independent  stages  in  tandem, 
with  the  time  spent  in  each  stage  having  an  exponential  distribution,  it 
is  possible  to  preserve  the  Markovian  character  of  the  queueing  system. 
The  memoryless  property  of  such  systems  simplifies  the  resulting 
equations  governing  queue  behavior,  such  as  the  probability 
distributions  of  customer  waiting  time  and  number  of  customers  in  the 
system.  Jensen  [1954]  generalized  Erlang's  technique,  in  part  by 
allowing  the  exponential  stages  to  have  non-identical  parameters. 

Much  of  the  queueing  literature  makes  use  of  the  theory  of  comple.x 
variables  in  the  frequency  domain  which  results  when  Laplace  transforms 
of  the  probability  distributions  of  interest  are  computed.  Smith  [1953] 
noted  that  the  probabilities  resulting  from  the  method  of  stages  have 
Laplace  transforms  that  are  reciprocal  polynomials  having  negative  real 
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roots.  He  extended  the  concept  of  stages  by  defining  the  class  to  be 
all  those  distribution  functions  whose  transforms  are  reciprocal 
polynomials  of  degree  n  with,  in  general,  complex  roots.  He  then 
showed,  using  Lindley's  GI/G/1  formulation,  that  under  mild  conditions 
on  the  interarrival  and  service-time  distributions,  a  service-time 
distribution  of  type  implies  that  the  total  equilibrium  system  time 
(queueing  plus  service)  is  also  of  type  K^.  In  particular,  if  service 
time  is  exponential,  so  is  the  system  time  for  any  distribution  of 
interarrival  times. 

Cox  [1955]  extended  the  concept  of  stages  further  by  considering 
the  class  of  distributions  having  rational  Laplace  transforms.  He 
showed  that  the  method  of  stages  can  still  be  employed  for  this  larger 
class  of  CDFs  if  one  is  willing  to  tolerate  stages  having  comple.x  roots 
and  "probabilities"  Chat  may  be  negative.  While  the  fictitious  stages 
do  not  therefore  correspond  to  pliysical  entities,  the  resulting  overall 
probabilities  will  be  valid.  The  advantage  of  such  an  approach  is  that 
the  desirable  mathematical  properties  of  Markovian  systems  may  be 
retained.  Cox  went  on  to  provide  some  justification  for  restricting 
attention  to  distributions  with  rational  transforms  by  noting  that  if 
the  degree  of  the  polynomials  is  allowed  to  be  countably  infinite,  any 
CDF  can  be  closely  appro.ximated  by  one  having  a  rational  transform. 

Wishart  [1959]  used  the  method  of  stages  and  Markov  chains  to 
verify  Smith's  result  for  the  equilibrium  distribution  of  waiting 

times  in  a  GI/G/l  qinnie  iiaving  arbitr.iry  interar  r  iva  1 -t  ime  distribution 
and  service-time  distribution  characte  r  i /,ed  by  a  series  of  F.rlang 
stages . 


Kotiah  et  al.  [1969]  approximated  the  GI/G/1  queue  by  assuming  that 
both  the  interarrival  and  service-time  distributions  were  Erlangian, 
that  is,  consisted  of  a  series  of  exponential  stages.  They  developed 
numerical  procedures  to  calculate  the  mean  waiting  time  for  the  system 
and  examined  the  effect  of  varying  the  skewness  of  the  interarrival 
distribution . 

Schassberger  [1970]  established  the  theoretical  basis  for  some  of 
the  earlier  work  using  the  method  of  stages  to  obtain  waiting-time 
distributions  for  the  GI/G/1  queue.  In  doing  so  he  showed  how  a 
sequence  of  mixtures  of  Erlang  CDFs  may  be  constructed  that  converge 
weakly  to  any  desired  distribution  function  defined  on  [0,“). 

Neuts  [1975,  1981]  has  popularized  the  class  of  phase-type,  or  PH, 
probability  distributions.  These  are  distributions  that  arise  or  can  bo 
interpreted  as  the  time  until  absorption  in  a  finite  Markov  chain,  and 
have  rational  Laplace  transforms.  Their  major  advantage  is 
computational;  instead  of  differential  equations,  complex  variables  and 
numerical  integration,  they  admit  of  matrix-geometric  procedures.  A 
drawback  of  PH  distributions,  however,  is  the  nonuniqueness  of 
representation.  Many  different  combinations  of  defining  parameters  lead 
to  the  same  CDF  and  many  of  these  representations  are  not  of  minimal 
order . 

Theoretical  justification  for  the  use  of  approximating 
distributions  has  also  been  provided  by  work  on  the  continuity  of 
queues.  Kennedy  [1972,  1977]  and  Whitt  [1974]  have  shown  that  if  the 
interarrival  and  service-time  distributions  of  otherwise  identical 
queues  are  close  in  some  sense,  then  tlie  corresponding  performance 


measures  such  as  queue  length  and  waiting  time  will  also  be  close  in  an 
appropriate  sense.  A  very  demanding  technical  treatment  is  needed  to 
establish  these  results  which  requires  careful  definition  of  the 
underlying  spaces,  metrics,  convergence  concepts,  and  topologies.  Both 
authors  cite  the  sequence  of  mi.xed  Erlang  distributions,  introduced  by 
Schassberger  that  converges  weakly  to  an  arbitrary  CDF.  By  constructing 
a  sequence  of  such  general  Erlang  models  for  a  given  GI/G/c  queue,  where 
the  actual  interarrival  and  service-time  distributions  are  appro.ximated , 
the  weak  convergence  of  the  two  sequences  of  CDFs  implies  the  weak 
convergence  of  the  corresponding  performance  measures. 

This  concept  of  weak  convergence  of  probability  measures  has  found 
widespread  application  in  applied  probability  modeling.  Queueing  theory 
happens  to  be  the  area  in  which  most  of  the  weak  convergence  results 
have  been  used.  Iglohart  [1973]  has  written  a  useful  survey  paper  that 
details  the  uses  of  weak  convergence  in  queueing.  Discussions  on 
continuity  of  queues  and  rates  of  convergence  are  included. 

Another  interesting  survey  paper  is  that  of  Bhat  et  al.  [1979]. 
They  consider  the  use  of  appro.ximat ions  in  queueing  applications  but 
their  definition  of  approximation  is  somewhat  broader  than  ours. 
Besides  the  use  of  appro.ximat ing  distributions,  which  they  subsume  under 
the  heading  of  system  appro.ximat  ions ,  they  e.xamine  two  other  classes  of 
approximations.  Process  appro.ximat  ions  are  concerned  with  replacing  the 
physical  process  under  study  by  a  simpler  one  and  include  the  use  of 
diffusion  and  fluid  aiipro.ximat  ions .  Numerical  appro.ximat  ion  involves 
methods  of  simplifying  the  arithmetic  computations  tliat  arise  in  solving 
the  systems  model;  establishing  upper  and  lower  bounds  on  performance 
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measures  and  using  numerical  methods  to  invert  analytically  intractable 
Laplace  transforms  are  examples  of  this  type  of  approximation. 

This  concludes  our  brief  review  of  the  salient  developments  in  the 
use  of  mi.xed-exponential-type  approximations  in  applied  probability. 
Although  the  emphasis  has  been  on  queueing  applications,  the  basic 
concepts  have  wide  applicability.  While  the  family  of  mixed  Erlang 
distributions  has  certainly  been  the  most  popular  class  of  appro.ximating 
functions,  we  will  make  a  case  in  the  sequel  for  considering  the 
generalized  hyperexponential  distributions.  Besides  being  of  simple 
form  which  facilitates  numerical  manipulations,  GH  distributions  have  a 
unique  representation  which  is  desirable  for  such  statistical  procedures 
as  parameter  estimation.  They  e.xtend  the  familiar  hyperexponential 
class  of  distributions  and  enjoy  the  analytical  benefits  of  having 
rational  Laplace  transforms.  Furthermore,  recently  developed  algorithms 
for  fitting  hyperexponential  distributions  to  empirical  data  (see  Kaylan 
and  Harris  [1981]  and  Mandelbaum  and  Harris  [1982])  can  be  readily 
generalized  to  include  GH  distributions. 


RELATIONS  AMONG  CLASSES  OF  DISTRIBUTION  FUNCTIONS 


n 


In  this  section,  families  of  probability  distribution  functions 
that  find  wide  use  as  approximations  to  more  general  CDFs,  for  example, 
in  queueing  applications,  are  defined  and  related  to  one  another.  The 
more  obvious  relations  are  mentioned  with  the  definitions,  while  others 
are  presented  in  following  subsections. 

Several  of  the  definitions  below  are  stated  in  terms  of  the 
one-sided  Laplace-Sticlt jes  transform  of  a  CDF,  F.  This  transform,  F* , 
is  defined  in  the  ususal  way  as 

oo 

F*(s)  =  S  e'^'^  dF(t), 

0 


which  is  equivalent  to  the  ordinary  one-sided  Laplace  transform  of  a 
PDF,  F'(t)  =  f(t),  whenever  F(t)  is  absolutely  continuous. 

2 . 1  Dcf init ions 

K  Class 
n 

Smith  [1953]  defined  the  class  K  to  be  those  distribution 

n 

functions  whose  Laplace  transform  is  the  reciprocal  of  ,i  polynomial  of 
n^^  degree.  Of  course,  not  all  reciprocal  polynomials  are  transforms  of 
CDFs.  For  instance,  the  real  part  of  each  polynomial  root  must  be 
negative.  While  the  roots  may  be  comple.x,  they  must  occur  in  conjugate 
pairs  since  the  corresponding  CDF  is  real.  There  arc  also  additional 
constraints  that  are  not  so  obvious.  Ltikacs  and  Sxasz  [1951]  have  shown 
that  one  of  the  roots  with  greatest  real  part  must  be  real.  Therefore, 
the  simplest  member  of  having  comple.x  roots  is  of  tlu'  form 

.  _  ,i(a‘'  +  b-i 

-  - 

(.s+.i)  1  (s+a)  +  h“  I 
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distributions.  For  example,  consider  the  two  following  distinct 
phase-type  representations: 


and 


Q  = 


-3  1  1 

1  -4  2 

1  0  -6 


0 

-5 


a  =  (0,  1/2,  1/2) 


a’  =  (2/3,  1/3) 


Clearly  the  two  representations  are  different  and  are  not  of  the  same 
order.  However,  each  results  in  the  same  CDF,  namely,  F(t)  =  1 

-  (2o  “  /3  +  e  /3).  The  second  representation  would  be  of  minimal 

order  since  the  CDF  is  a  mixture  of  two  e.xponentia Is . 

Mixed  generalized  Erlang  distributions  also  permit  multiple 
representations.  From  tiie  notation  of  Dehon  and  Latouche  [1982]  we  may 
represent  the  CDF  of  the  sum  of  n  inciependent  random  variables,  eacli 
exponentially  distributed  with  parameter  (i  =  l,2,..,n,),  by  F^,,  , 

This  CDF  is  obtained  in  terms  of  the  underlying  exponentials  by  Kcpiation 
(2.3.2).  But  the  two  CDFs  defined  by 

F(t)  =  (1/3)  F^  +  (2/31  F^2 

and 

Gftl  =  (1/3)  F^  +  (4/91  F^,  +  (2/91  F^^^ 
arc  in  fact  the  same.  This  can  be  seen  by  e.xpressing  each  as  a  linear 
combination  of  the  underlying  exponential  distributions.  As  discussed 
abo\’e ,  this  rej/rosentat  ion  is  unique  and  yields 

F(tl  =  C(L,  -  (-1/3)  Fj  +  (.4/3)  F,j 
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2 . 6  Uniqueness  of  Representation 

For  statistical  applications,  an  important  property  of  mixture-type 
CDFs  is  uniqueness  of  representation,  or  identifiability .  Yakowitz  and 
Spragins  [1968]  define  the  identifiability  of  finite  mixtures  as 
follows.  If  is  a  collection  of  CDFs,  then  the  class  of  finite 

mi.xtures  of  the  is  said  to  be  identifiable  if  i.‘.ie  convex  hull  of 

{Fj^}  has  the  property  that 

N  M  ,  . 

I  c.F.  =  Z  c.F. 

It  1  1 

1=1  1=1 

where  c .  0 ,  Z  c.  =  1,  implies  N  =  M  and  that  for  each  i  (1  <  i  <  N) 

11 

I  I 

there  is  some  i  (1  ^  i  <  Nl  such  chat  c.  =  c.  and  F.  =  F..  A  necessary 

1  J  t  j 

and  sufficient  condition  for  identifiability  is  that  tlic  class  (F^)  be  a 
linearly  independent  set  over  the  field  of  real  numbers.  Tliis  follows 
from  the  uniqueness  of  representation  property  of  a  basis  in  a  vector 
space . 

Since  any  collection  of  disi’uct  exponentials  is  linearly 
independent,  the  class  of  finite  mi.xtures  of  exponential  CDFs  is 
identifiable.  A  broader  concept  of  identifiability  for  generalized 
mixtures  also  applies  when  the  underlying  family  of  CDFs  is  exponential. 
A  generalized  mixture  is  one  wlicre  the  mi.xing  parameters  sum  to  unity 
but  can  have  any  real  values,  and  of  course,  the  GH  distributions  are  of 
this  form.  Again,  the  uniqueness  of  the  representation  of  vectors  with 
respect  to  a  basis  for  the  vector  space  implies  that  GH  distributions 
have  unicpie  representations  as  linear  combinations  of  exponentials. 

Importantly,  the  otlier  families  of  CDFs  considered  in  this  work  do 
not  share  the  uniq\ieness  of  representation  property  with  the  GH 


i 


» 


» 
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and  MGE  is  a  proper  subset  of  PH.  The  results  presented  in  Examples 
2.3.1  and  2.4.1  are  developed  more  fully  in  Botta  [1985]  where 
conditions  are  also  given  for  GH  and  PH  distributions  (with  real  roots) 
to  have  MGE  representations  of  the  same  order  as  the  GH  representation. 
These  conditions  are  readily  computed  from  the  given  distribution  and  do 
not  require  solving  for  the  coefficients. 

2 . 5  Summary  of  Sot  Inclusion  Relations 

The  results  of  the  foregoing  subsections  yield  the  following  set  of 
relations  among  the  classes  of  distribution  functions: 

( 1 )  GE  C  K  C  R 

n  n 

(2)  GE  C  MGE  C  GH  C  R 

a 

(  3  )  GE  C  MGE  C  PH  C  R 

n 

(•i)  i'H  K  ;  K  C±  PH  >  K  (t  PH 
It  11  n  ^ 

(  ■>  ■)  !’H  qC  GH  ;  GH  i’H 

(  t' I  (Hi  qt  MGE  ii'.'l  s  iiiic  orch'i'l 

TIu'm-  rel.it  lo'is  n;  he  H‘ ■  j.  i t  eii  in  the  fo  1  lov%  ing  \'enn  diagram. 


in  the  subsection  on  uniqueness  of  representations,  that  it  may  be 
possible  to  obtain  a  MGE  representation  by  embedding  the  problem  in  a 
higher  order  space  even  when  there  is  no  valid  MGE  representation  in  the 
original  space. 


2.4  .MGE  and  PH 

We  established  in  subsection  2.1  that  all  MGE  distributions  are 
phase  type.  Since  PH  distributions  may  include  trigonometric  terms,  it 
is  clear  that  the  MGE  distributions  are  a  proper  subset  of  PH.  But  what 
if  the  PH  generator  matrix  is  allowed  to  have  only  real  eigenvalues?  Is 
the  resulting  subclass  of  PH  distributions  contained  in  MGE?  The  answer 
is  no.  We  obtain  this  result  by  way  of  a  counter  e.xample. 

Example  2.4.1  The  PH  distribution  given  by 


-4  846t  -4  1948t  -  959t 

F(t)  5  1  -  (1.293  e  -  .343  e  +  ,050  e 

was  obtained  from  the  generator  matrix 


Q  - 


-5  0  1/8 

4-4  0 

0  1  -1 


with  a  =  (1,0,0).  As  before,  equating  F(t)  to  bjF^(t)  +  b.,F^.,(t)  + 

and  solving  for  the  {b^}  yields  the  result  that  b^  =  -.0369. 
Since  each  b.  must  be  nonnogative,  we  do  not  have  a  valid  MGE 
representation.  Tims,  PH  distributions  with  real  roots  do  not 

necessarily  belong  to  MGE.  In  other  words, 

PH  (real  roots)  QL  MGE 
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By  substituting  (2.3.2)  in  (2.3.3),  a  triangular  system  of  linear 
equations  relating  the  coefficients  is  obtained.  This 
system  of  equations  is  readily  inverted  to  yield  the  in  terms  of 
the  {a^}.  For  the  case  of  n  =  3,  it  turns  out  that  b^  and  b^  arc  always 
nonnegative  for  any  choice  of  {a^}  corresponding  to  a  GH  distribution. 
The  nonnegativity  of  b^  requires  that 


^2  -  ^3 


(2.3.4) 


The  ne.xt  example  shows  that  GH  distributions  exist  for  which 
(2.3.4)  is  violated. 

Example  2.3.1  Consider  the  GH  CDF 

-4t  -St  -'^t 
F(t)  =  1  -  (6e  -13e  +  8e  *•  )  . 

Here 

aj^  =  6,3.,  =  -13,  ^3  =  8 
^  ^2  =  3.  X3  =  2. 


Therefore 

^3^^r^3^  ^  ^  . 

■  X2(X^-X2)  ^3  ■  3 

Since  <  -32/3,  we  see  that  (2.3.4)  is  violated  and  thus  that  no  MGE 
representation  e.xists  for  F(t).  This  e.xample  establishes  that 

GHgllMGE  , 

and  that  the  class  of  MGE  distributions  is  thus  a  proper  subset  of  the 
class  of  GH  distributions. 

The  above  result  holds  when  the  order  of  the  MGE  representation 
must  be  the  same  as  Lliat  of  the  GH  tlistr  ibut  ion .  \v'c  demonstrate  below. 
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where  the  are  real.  Any  mixture  of  such  distributions  has  a 

transform  of  the  same  form.  Therefore  any  mixed  generalized  Erlang 

distribution  is  in  GH  and 

MGECGH  .  (2.3.1) 

Based  upon  results  in  Dehon  and  Latouche  [1982],  we  next 
demonstrate  the  e.\istence  of  GH  distributions  that  cannot  be  represented 
as  MGEs  of  the  same  order.  They  show  that  any  GE  distribution 

constructed  from  a  subset  of  e.xponential  distributions,  {F^},  can  be 
expressed  as  a  random  combination  of  the  GE  distributions  F^,  ’ 

Fj2  where  F^^  ^  is  the  distribution  of  the  sum  of  the  first  i 

independent  exponential  random  variables.  Each  such  distribution 

function  can  be  written  as 


(t  >  0) 


(2.3.2) 


where  ~  I-"®  •  (It  has  been  assumed  without  loss  of  generality 

that  >...>  Since  the  are  constants,  (2.3.2)  is  in  the 


form  of  a  GH  distribution  whose  coefficients  are  determined  by  the  {X,}, 
which  agrees  with  (2.3.1).  In  order  for  a  GH  distribution. 


n  -  X  .  t 

F(t)  =  1  -  E  a.e  ,  to  have  a  MGE  representation,  there  must  exist  a 
i=l 

set  of  nonnegative  numbers  (b^,  i  =  l,2,...,n}  which  sum  to  one  and 
satisfy  the  equation 


(2.3.3) 


Because  of  the  trigonometric  terms,  F(t)  is  clearly  not  in  GH.  So 


PH<^GH  . 

But  does  ev’ery  GH  distribution  have  a  PH  representation?  The 
answer  is  no.  As  mentioned  earlier,  the  density  function  corresponding 
to  any  PH  distribution  is  strictly  positive  for  all  t  >  0.  The 
following  example  exhibits  a  GH  distribution  chat  violates  this 
condition . 

Example  2.2.2  Consider  the  GH  distribution  defined  by 
F(t)  =  -  6e~^^  +  3e'^^) 

with  corresponding  density 

f(t)  =  F'(t)  =  Ae'*"  -  12e*“’'  +  9e‘^^ 

It  can  easily  be  shown  that  f(c)  =  0  for  both  t  =  0  and  t  =  In  (3/2)  and 
that  f(t)  >  0  for  all  other  values  of  t.  Therefore,  F(t)  t  PH  and 

GHC^PH  . 


2.3  MGE  and  GH 

Recall  that  the  generalized  Erlang  (GE)  distributions  have  Laplace 
trans  forms 


n 

n 

i=l 


where  the  are  distinct.  Using  a  partial  fraction  expansion,  this 
transform  can  be  written  as 


n 

I 

i=l 


A. 

1 

s  +  \^ 
I 


) 


boundary  equation  can  be  easily  used  to  determine  if  a  candidate 
exponential  sum  is  in  fact  in  GH.  For  sums  of  more  than  three 
exponential  terms,  the  boundary  equation  could  be  determined  in  similar 
fashion  but  would  be  very  involved  and  still  not  of  much  practical  use 
in  determining  membership  in  GH. 

We  next  develop  some  additional  relations  between  the  classes  K  , 

n 

R  ,  GE,  MGE,  PH,  and  GH. 
n 


2.2  GH  and  PH 

From  the  preceding  subsection  we  know  that  all  PH  distributions  are 
in  R^.  But  if  the  roots  of  the  denominator  polynomial  are  complex,  the 
corresponding  distribution  will  not  belong  to  GH.  The  following  example 
displays  such  a  PH  distribution. 

Example  2.2.1  Consider  the  3x3  generator  matrix 


Q  = 


-1  1  0 
1  -2  1 
1  0  -3 


The  eigenvalues  of  Q,  which  are  equal  to  the  roots  of  the  denominator 
polynomial  of  the  Laplace  transform  of  e*^^,  are 

=  -.2307  ;  ^  ±  .5897  i 

where  i  =  /-I.  The  resulting  PH  distribution  corresponding  to  an 

initial  state  vector  a  =  (1,0,0)  is 

-  '’307t 

F(t)  =1-1. 1729  e 


-2 . 884e>t 


-  (.1729  cos  .5897l  +  . 38b8  sin  .5897t]  e 


Note  that,  unlike  the  usual  hyperexponential  distribution,  we  do  not 


require  that  each  a^  be  nonnegative.  This  added  freedom  makes  the  GH 
distributions  extremely  versatile.  Indeed,  in  the  following  section,  we 
derive  the  critical  characterization  that  any  CDF  on  [0,»)  can  be 
appro.ximated  as  closely  as  desired  with  respect  to  an  appropriate  metric 
by  a  member  of  GH. 

The  Laplace  transform  of  a  GH  distribution  has  the  form 


n 

Z 

i=l 


a .  X . 

1  1 

s+X . 

i 


so  we  immediately  note  that 

GHCR^  .  (2.1.7) 

Of  course,  not  all  linear  combinations  of  exponentials  of  the  form 
n  -X . t  n 

1  -  Z  a.  e  with  X.  >  0  and  Z  a.  =  1  are  GH  distributions. 

1  1  i 


n 

For  example,  the  monotonicity  condition  requires  that  Z  a.X.  ^  0  . 

i=l 

Also,  assuming  X^  to  be  the  smallest  of  the  X^,  the  corresponding 
coefficient  a^  must  be  positive  to  insure  proper  asymptotic  behavior  as 
t  Bartholomew  [  1969]  has  established  a  number  of  sufficient 

conditions  for  a  linear  combination  of  exponentials  to  be  a  GH 
distribution,  but  no  simple  set  of  conditions  that  are  both  necessary 
and  sufficient  is  known.  Dehon  and  Latouche  [1982]  have  recently 
characterized  the  class  of  GH  distributions  by  deriving  a  parametric 
equation  of  the  boundary  of  the  convex  region  constituting  GH  for  the 
case  n  =  3.  The  geometric  representation  is  obtained  by  choosing  a  set 
of  basis  vectors  from  the  class  of  all  GH  distributions  composed  of 
linear  combinations  of  three  exponentials.  It  does  not  appear  that  the 


yields  rational  expressions  for  each  component  of  V*(s).  Therefore,  the 
probability  distribution  of  each  state  belongs  to  as  does  the 

distribution  of  the  time  until  absorption.  We  have,  therefore,  the 

relation 

PHCR  -  (2.1.6) 

n 

Phase-type  distributions  can  easily  be  constructed  with  Laplace 
transforms  which  are  not  reciprocal  polynomials,  so  that  PH^K^.  But 

is  it  possible  that  every  distribution  has  a  PH  representation?  The 
answer  is  no.  Corollary  2.2.1  in  Neuts  [1981]  establishes  that  any 

non-trivial  PH  distribution  has  a  corresponding  density  function  that  is 
strictly  positive  for  all  t  >  0.  The  PDF  given  earlier  as  (2.1.1)  has  a 
reciprocal-polynomial  Laplace  transform  but. the  density  function  is  zero 
wherever  cos  bt  =  1 .  Therefore,  the  corresponding  distribution  function 
is  not  in  PH.  We  have  then  that  K  PH  which  implies  that  R  <2!PH  and 
that  PH  is  thus  a  proper  subset  of  R^. 

It  should  be  noted  that,  given  an  aribtrary  CDF,  there  is  no  easy 
way  to  determine  if  it  is  in  PH.  One  must  search  for  a  suitable 
generator  matrix  and  set  of  initial  conditions  that  will  yield  the 

desired  distribution. 

GH  Class 

The  generalized  hyperexponential  distributions  are  CDFs  of  the  form 

n  -X  ,  t 

1  -  Z  a .  e  ^ 
i=l  " 

n 

with  X.  >  0  and  real,  I  a.  =  1  and  a.  real. 

^  i=i  ^ 


It  should  be  noted  that  PH  representations  are  not  unique.  That 
is,  there  may  exist  many  different  generator  matrices  of  different 
orders  that  lead  to  the  same  CDF.  Examples  are  given  below  in 
subsection  2.6.  The  problem  of  finding  minimal  representations  of  PH 
distributions,  that  is,  where  the  order  of  Q  is  as  small  as  possible, 
has  not  been  solved.  Neuts  [1981]  established  that  the  class  of  PH 
distributions  is  closed  under  convolution  and  finite  mixtures,  though  in 
general,  infinite  mixtures  of  PH  distributions  are  not  of  phase  type. 
However,  if  the  mixing  probabilities  are  discrete  phase  type,  then  the 
infinite  mixture  is  also  of  phase-type. 

From  the  preceding  discussion  it  follows  that  MGE  distributions  are 
phase  type,  i.e. , 

MGE  Cl  PH. 

The  representation  (2.1.4)  of  a  PH  distribution  was  obtained  from 
the  distribution  functions,  v(t),  of  the  individual  states  of  the 
underlying  Markov  chain  which  are  the  solutions  of 


dv(t) 

dt 


v(t)»Q 


(2.1.5) 


The  solution  to  this  equation  is  v(t)  =  v(0)e^^  =  ae^*" .  Taking  the 
Laplace  transform  of  (2.1.5)  yields 

sV*(s)  -  v(0)  =  V*(s)»Q  , 

so  that 

V*(s)  (sI-Q)  =  v(0)  =  a 
or 

V*(s)  =  a  (sI-Q)'^  . 

-  1  Qt 

Thus  (sI-Q)  is  the  Laplace  transform  of  e  ,  and  each  term  in  the 


inverse  matri.x  of  sI-Q  is  a  rational  expression.  Multiplication  by  a 


I 


11 

^12‘ 

••^In 

0;  q  .  . 
n 

>  0,  i  /  j 

‘21 

~^22 ' 

••^2n 

-‘’ii  ^ 

E  q.  . 
i=l 

<  0,  i  =  i 

nl 

'ln2' 

^nn 

This  generator  matrix  corresponds  to  an  (n+l)-stdte  Markov  chain  with 


state  (n+1)  being  the  absorbing  barrier.  The  vector  a  = 


is  the  vector  of  initial  state  probabilities  at  t  =  0,  and  the  vector  e 


is  an  n-dimensional  column  vector  of  all  ones.  The  entries,  q...  in  the 

ij 


generator  matrix  represent  the  instantaneous  rate  of  the  transition  from 
state  i  to  state  j.  Two  e.xamples  of  distribution  functions  with  PH 
representations  follow. 

Example  2.1.1  The  GE  distribution  of  order  n  with  parameters 
, . . . , has  the  representation  a=  (1,0,0,. .,0)  and 


-Xi  S  0- 


0  -x^  x^. 


0  00 . -X  ,  X  , 

n-1  n-1 


0  00 . 0  -X 


Example  2.1.2  The  mixed  exponential  distribution 

n  -X  .  t 

F(t)  =  I  a.(l-e  ^  ) 
i=l  ^ 


has  the  representation  a  =  (a ,0^ ,  .  .  .  ,a^^)  and 


-X^  0 . 0 


Q  =  0  -X. 


0  0  ...  -X 


■ 


When  combined  into  a  single  fraction,  this  becomes  ihe  qiiotii'iit  of  two 
polynomials,  the  degree  of  the  denominator  being  n  and  the  degree  of  the 


numerator  n-1.  This  motivates  the  definition  of  R  as  the  class  of 

n 

distributions  whose  transforms  are  rational,  with  n  being  both  the 

degree  of  the  denominator  polynomial  and  the  maximal  degree  of  tlie 

numerator  polynomial.  We  have  therefore  established  that  the  class  of 

mixed  generalized  Erlang  distributions,  denoted  by  MGE ,  is  contained  in 

R^.  Cox  [1955]  points  out  that  both  the  convolution  and  the  mixture  of 

any  pair  of  distributions  in  R  yields  another  distribution  with 

n 

rational  Laplace  transform.  Furthermore,  all  distributions  in  R  are 

n 

continuous  e.xcept  for  possible  atoms  at  the  origin  and  the  corresponding 
density  function  is  positive  everytdiere  in  (0,»>)  except  at  isolated 
points.  Finally,  it  is  obvious  that 


(2.1.3) 


PH  Class 

Neuts  [1975,  1981]  has  popularized  a  class  of  distribution 
functions  that  he  refers  to  as  phase  type,  or  PH,  distributions.  A  CDF 
is  said  to  be  of  phase  type  if  it  arises  as  the  time  until  absorption  in 
a  finite-state  continuous -time  Markov  chain.  That  is,  F,  is  phase  type 
if  it  can  be  written  as 


F(t)  =  1 


(2.1.4) 


where  Q  is  the  generator  matri.x  and  has  the  form 


corresponding  to  the  PDF 

f(t)  =  ab’^(a^  +  b^)  e  (1  -  cos  bt)  (a  >  0) .  (2.1.1) 
Clearly,  the  ordinary  e.xponential  distribution  belongs  to  Since  the 
Laplace  transform  of  the  distribution  of  a  sum  of  independent  random 
variables  is  the  product  of  the  Laplace  transforms  of  their  individual 
distributions,  it  follows  that  the  generalized  Erlang  CDFs  corresponding 
to  a  sum  of  independent,  exponentially  distributed  random  variables  with 
distinct  parameters  are  also  in  These  generalized  Erlangs,  denoted 
GE ,  have  transforms  of  the  form 


n 

n 

i=l 


(  X.  >  0) 


where  X^/(s+X^)  is  the  transform  of  an  exponential  CDF  having  mean  1/X^. 

If  all  the  random  variables  are  identically  distributed,  the  resulting 

distribution  is  the  (simple)  Erlang  of  degree  n,  £^(X),  and  its  Laplace 

transform  is  just  x'V(s+X)'^.  Tlierefore  we  see  that  E  (X)  e  K  and 

n  n 

GECK  (2.1.2) 

n 


R  Class 
n 

While  K  contains  GE ,  it  does  not  contain  mixtures  of  GE  CDF.s ,  i.e., 
n  ’  ’ 

n  n 

distributions  of  the  form  Z  a.  F.  with  a.  >  0,  I  a.  =  1  and  F.  e  GE. 

i=l  ^  ^  ^  i=l  ^  ^ 

For  example,  suppose  each  F^  is  exponential.  By  the  linearity  of  the 


Laplace  transform,  the  transform  of  Z  a.  F.  is 

i=l  ^  ^ 


n  X . 

.  ,  1  s+X . 

1=1  1 


8 


As  in  the  PH  example,  one  of  the  MGE  representations  is  not  of  minimal 
order . 


For  most  applications,  such  as  curve  fitting,  non-uniqueness  of 
representation  is  a  disadvantage.  Ve  now  discuss  a  situation,  mentioned 
in  subsection  2.3,  where  obtaining  a  representation  of  non-minimal  order 
may  be  useful.  Suppose  we  have  a  GH  distribution  that  does  not  have  an 
MGE  representation  of  minimal  order.  It  may  be  possible  to  embed  the 
distribution  in  a  higher  order  space  in  such  a  way  that  an  MGE 
representation  is  obtained.  We  illustrate  the  procedure  via  an  e.xample. 
Example  2.6.1  Consider  the  GH  distribution 


F(t)  =  1 


.  13  -7t  ^ 

(--  e  F 


77 

12 


-4t 

e 


35  -3t  ^  21  -2t, 

-ye  +  —  e  ) 
4  3 


Here  =  7,  X^  =  4,  X^  =  3,  X^  =  2.  Dehon  and  Latouche  [  1982] 
established  that  an  MGE  representation  exists  if,  and  only  if,  there 
exists  a  set  of  coefficients  i  =  1,2, 3, 4)  such  that 


22 


We  must  now  solve  for  the  coefficients  {b^}  from 


ft  t  I  r  t 


F(t)  =  F^2  ^  ^3  F323  +  b^  F^23^  +  b.  F^33^3 


where  the  primes  indicate  that  the  corresponding  terms  are  defined  with 

I  f 

respect  to  the  {X^}.  It  turns  out  that  there  is  a  solution  for  the  {b^} 
that  results  in  the  representation 

1  I  1  I  1  .1  j  I 

Fftl  =  -F+-F  +  —  F  +—  F  +-F 

4  1  3  12  24  123  24  1234  3  12345 


Not  only  does  this  give  us  an  MGE  representation,  it  also  confirms  that 
the  original  F(t)  is  in  fact  a  valid  CDF  since  it  can  be  expressed  as  a 
mixture  of  CDFs . 

This  example  raises  the  question  of  whether  it  is  possible  to 
obtain  an  MGE  representation  for  every  GH  distribution.  The  answer,  of 
course,  is  no  since  all  MGEs  are  of  phase  type  and  we  have  seen  that 
there  exist  GHs  that  are  not  members  of  PH.  A  fuller  discussion  of  the 
representation  of  GH  distributions  as  MGEs,  including  a  set  of  necessary 
and  sufficient  conditions  that  do  not  require  solving  for  the  tb^} 
coefficients,  is  contained  in  Botta  [1985]. 

The  uniqueness  property  provides  a  strong  rationale  for  our 
interest  in  the  GH  class  of  distributions.  We  turn  ne.xt  to  an 
e.xamination  of  their  suitability  for  providing  approximations  to 
arbitrary  distribution  functions. 
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3.  DENSENESS  RESULTS  FOR  GH  DISTRIBUTIONS 

In  this  section  we  establish  a  major  justification  for  our  interest 
in  the  class  of  generalized  hyperexponential  distributions  by  showing 
that  GH  CDFs  are  dense  in  the  class  of  all  cumulative  distribution 
functions  on  the  nonnegative  real  line.  That  is,  any  CDF  can  be 
approximated  arbitrarily  closely  (with  respect  to  some  metric)  by  a 
member  of  GH.  The  result  eventually  follows  from  a  similar  result  for 
Erlang  mixtures  (see,  lor  example,  Schassberger  [1970],  Whitt  [1974], 
and  Kennedy  [1977]).  A  theorem  from  functional  analysis  concerning  the 
appro.ximat ion  of  a  continuous  function  by  an  e.xponential  sum  is  first 
extended  to  show  that  a  certain  class  of  probability  density  functions 
can  be  approximated  by  a  GH  density.  Several  intermediate  results  then 
lead  to  the  desired  denseness  property  of  the  class  GH. 

3 . 1  Denseness  of  Erlang  Mi.xtures  in  the  Topology  of  Weak  Convergence 

Consider  an  arbitrary  CDF  F(t)  on  [0,«>).  Define  a  sequence  of 

general  Erlang  CDFs  by 

F  (t)  =  F(0)  +  I  [F(j;)  -  F  (^)]  (t)  (t>0)  (3.1.1) 

11  4^  11  11  11 

k=l 

where  k-fold  convolution  of  the  exponential  CDF  with 

mean  1/n.  Schassberger  [1970],  Whitt  [1974],  and  Kennedy  [1077]  state 

that  the  sequence  converges  weakly  to  F.  That  is,  F^(t)  converges 

to  F(t)  at  each  continuity  point  of  F. 

The  notion  of  weak  convergence  induces  a  topology  on  the  space  of 

CDFs.  The  resulting  topological  space  can  also  be  generated  by  a  number 

of  metrics  that  measure  the  distance  between  any  pair  of  CDFs. 


Convergence  with  respect  to  these  metrics  is  then  equivalent  to 


topological  convergence.  The  resulting  convergence  in  distribution, 
though  weaker  than  the  classical  concepts  of  pointwise  and  uniform 
convergence,  is  useful  for  probabilistic  modeling  in  situations  where 
the  stronger  notions  of  convergence  often  fail.  This  occurs,  for 
e.xample,  when  the  CDFs  of  interest  have  points  of  discontinuity. 

A  useful  example  of  a  metric  defined  on  the  space  of  CDFs  is 
provided  by  the  Levy  distance.  If  F(t)  and  G(t)  are  two  distribution 
functions,  the  Levy  distance  between  them,  denoted  as  L(F,G),  is  defined 


L(F,G)  =  inf  {si  for  all  t,  F(t-E)-£  <  G(t)  <  F(t+£)  +  s}. 
e>0 


This  analytic  definition  has  an  intuitive  geometric  interpretation.  In 
the  graphs  of  y  =  F(t)  and  y  =  G(t),  vertical  line  segments  are  drawn  at 
the  points  of  discontinuity  to  produce  two  continuous  curves.  Let  P  and 
Q  be  the  points  on  these  curves  that  form  the  intersection  of  the  curves 
with  the  line  t  +  y  =  c.  This  is  illustrated  below. 


1  \ 
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Denoting  by  PQ  the  Euclidean  distance  between  P  and  Q,  the  Levy  distance 
can  be  expressed  as 


L(F,G)  =  sup  ^  • 

c 

This  definition  illustrates  that  two  CDFs  can  be  close  in  the  Levy  sense 
if  their  points  of  discontinuity  are  close  "horizontally"  (i.e., 
Itj^  -  t^l  is  small),  even  though  they  may  not  be  close  "vertically," 
that  is,  with  respect  to  the  usual  sup  metric  which  requires  that 
lF(t)  -  G(t)l  be  small  for  all  values  of  t. 

The  connection  between  weak  convergence  and  convergence  with 
respect  to  the  Levy  metric  is  established  by  the  following  theorem  from 
Lukacs  [1975]  which  is  stated  here  without  proof.  The  geometric 
interpretation  of  L  given  above  is  from  the  same  source  and  a  proof  of 
the  theorem  appears  there  as  well. 

Theorem  3.1.1:  The  sequence  of  CDFs  {F^(t)}  converges  weakly  to  the 

CDF  F(t)  if,  and  only  if,  lim  L(F  ,F)  =  0. 

..  ^ 
n-><« 

It  is  important  to  note  that  the  common  statement  that  "a  class  of 
CDFs  is  dense  in  the  class  of  all  CDFs"  generally  is  taken  in  the  sense 
of  the  usual  topology  of  weak  convergence.  That  is  the  manner  in  which 
the  Erlang  mixtures  of  (3.1.1)  are  dense  in  the  class  of  all  CDFs  with 
support  on  the  nonnegative  real  lino. 

3.2  Appro.ximat ing  with  Exponential  Sums 

In  this  subsection  we  establish  that  a  continuous  function  on  [0,«>) 

that  vanishes  at  infinity  can  be  uniformly  approximated  by  a  sum  of 

exponential  terms  of  the  form 

I  a.e'^i^  (X.  "  0)  . 

1  1 


I 


I 


-  I 


> 


» 
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The  result  follows  from  the  extension  to  an  infinite  domain  of  the 


famous  Weierstrass  polynomial  approximation  theorem.  We  present  first 
the  case  where  the  are  integers  and  then  a  generalization  to 

arbitrary  X^.  The  following  lemma  from  Apostol  [1974]  is  stated  without 
proof . 

Lemma  3.2.1  If  f  is  continuous  on  [0,~)  and  if  f(t)‘*'aast->»», 
then  f  can  be  uniformly  appro.ximated  on  [0,«>)  by  a 
function  of  the  form  g(t)  =  p(e  where  p  is  a 
polynomial . 

We  now  show  that  if  the  continuous  function  being  appro.ximated  vanishes 
at  infinity,  the  constant  term  in  the  approximating  exponential  sum  can 
be  set  equal  to  zero. 

Lemma  3.2.2  I  f  f  is  continuous  on  [0,“)  and  if  f(t)‘*’0ast-^»», 
then  f  can  be  uniformly  approximated  on  [0,»)  by  .an 
exponential  sum  of  the  form 
n 

_  -kt 

I  a  e 

k=l 

Proc :  By  Lemma  3.2.1,  f  can  be  uniformly  approximated  by  the  sum  of 

the  form 

,  _  -kt 
a  +  I  a,  e 

°  k=l 


Thus  we  have  only  to  show  that  a  mav  be  cliosen  to  be  zero.  For  e 

o 

let 


0, 


f 

£ 


n(E ) 

a  (e  1  +  I  a,  (e ■) 
o  K 


-kt 

c 
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uniformly  approximate  f,  that  is,  |f  -  f  |  ^  e  for  all  t  e  [0,~). 


Now  consider 


n  ^  s  ^ 


-kt 


n(E ) 


la  (e)1  =  If  -  I  a  iE)e  1  =  if  -  f  +  f  -  I  a,  (E)e 
°  "  k=l  "  k=l 


-kt, 


Thus 


n(E) 

la  (e)1  <  i  f  -  f  j  +  ifl  +  I  Z  a  (E)e 

o  £  k=l 


“kt  I 


n(E) 


-kt 


But  lim  f(t)  =  0  and  clearly  1 im  E  a  (E)e  =  0. 
t-»oo  f^o»  k=l 

Therefore,  for  any  o  >  0  there  exists  a  value  T  such  that  t  >  T  implies 


n(E ) 

that  |f(t)|  S  a  and  j  E  a.  (e)  e 

k=l 


-kt  I 


<  a.  We  then  have 


a^(E)|  <1  f^  -  f  1  -1-  2a  <  E  +  2a  . 


Since  a  was  arbitrary,  it  follows  that 

I 1  ^  E • 

But  now  consider  the  modified  approximant 

n(E) 


f  =  f  -  a  (e)  =  E  a,  (E)e 

"  °  k=l  ^ 


-kt 


For  any  value  of  t 


f  -  f|  =  If  -  f^  +  a^(E)i  <  If  -  fj  +  |a^(E)l  <  2e, 


Since  e  is  arbitrary,  a  uniform  exponential  approximation  to  f  having  a 
zero  constant  term  can  always  be  found. 

Q.E.D. 

We  now  state  without  proof  a  generalization  of  this  result  that 
permits  the  coefficients  of  t  in  the  exponents  of  the  approximating 
function  to  be  non-integer.  The  lemma  is  found  in  Kammler  [197bl  and  is 
based  upon  the  Muiitz-Szasz  theorem  (see  Clieney  jl'Jbb]). 
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Lemma  3.2.3 


00 


Let  0  <  X  <  X.,  <...  and  assume  that  I  (1/X.)  diverges. 

^  "  i=l  ^ 

Then  the  set  of  exponential  sums  that  may  be  written  as 

-X.t 

finite  linear  combinations  of  the  functions  e  ,  i  = 
1,2,...,  is  dense  in  the  space  of  continuous  functions  on 
[O,")  that  vanish  at  infinity.  In  other  w^ords,  a 
continuous  function  on  [0,»)  that  vanishes  at  infinity 
can  be  uniformly  approximated  by  a  linear  combination  of 
exponentials  where  the  coefficients  of  t  in  the  exponents 
need  not  be  integers. 

3 . 3  Appro.ximating  PDFs  with  Exponential  Sums 

We  wish  to  develop  an  exponential  sum  approximation  to  a 
probability  density  function.  For  a  particular  class  of  PDFs  --  those 
whose  tails  decay  at  least  exponentially  fast  --  the  results  of  tiie 
preceding  section  can  be  applied  to  show  that  the  class  GH  is  dense  with 
respect  to  the  PDFs  of  interest.  That  is,  we  approximate  a  PDF  with  an 
exponential  sum  that  is  also  a  PDF. 

Theorem  3.3.1  Let  f  be  a  PDF  continuous  on  f0,«>)  and  let  f  0 

X  t 

e.xponentially  fast  as  t  »o.  That  is,  lim  f(t)e  =  0 

t-*’oo 

for  some  X^  >  0 .  Then  f  can  be  uniformly  approximated  on 
[0,<»)  by  a  generalized  hyperexponential  PDF. 

Proof :  The  proof  consists  of  three  parts.  First  we  find  an 

exponential  sum  <ipproximat  ion ;  ne.xt ,  we  modify  the  approximation  so  that 
it  is  nonnegative;  finally,  we  normalize  the  approximation  so  that  its 
area  is  unity. 
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(i)  Let  g(t)  =  f(t)  e  .  By  Lemma  3.2.3  we  can  approximate  gCf)  by  a 
function  of  the  form 
n 

g  =  Z  a  e  (X  >  0) 

k=l 

such  that  |g  -  gi  ^  £  for  all  t  £  Thus  we  may  write 

X  t  X  t  -  ( X  +X  ) t 

|g  -  il  =  |fCt)e  °  -  e  “  Z  a^c  °  |  <  £ 


X  t  -(X  +X,  )t 

e  °  1  f(t)  -  Z  a^e  °  |  <  e. 

Therefore 


-(X  +X  )t  -X  t 

f(t)  -  Z  Uj^e  °  I  <  £e  °  S  £. 


(3.3.1) 


-(X^+X^)t 

This  shows  that  f  =  Z  Oj^e  uniformly  approximates  f.  Of  course, 

f  may  be  negative  for  some  values  of  t  and  so  may  not  be  a  valid  PDF. 


(.More  on  this  subsequently.) 


(ii)  From  (3.3.1)  we  have 


1 f(t)  -  f(t)l  <  Ee  ° 


(3.3.2) 


so  that  0  <  f(t)  <  f(t)  +  £e  ,  where  the  first  inequality  follows 


from  the  fact  that  f  is  a  PDF.  Define  the  right-hand  side  to  be 


_  ^  -X  t 

f  H  ?(t)  +  Ee  °  >  f(t)  >  0. 


(3.3.3) 


-X  t  -X  t 

f-f(  =  lf-f-Ee  °;<if-f|+Ee  ° 


-X  t  -X  t  -X  t 

f-fl<Ee  °+Ee  ^  -2eo  ^  <2£. 


(3.3.4) 
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Therefore,  f  is  a  nonnegative  exponential  sum  that  uniformly 
approximates  f.  However,  f  >  f  from  (3.3.3),  so  that 


/  f  dt  >  /  f  dt  =  1 
0  0 


and  f  may  not  be  a  PDF. 


(iii)  To  produce  an  approximation  to  f  that  is  indeed  a  PDF.  we  must 


normalize  f  so  that  its  area  is  unity.  Let 


A  =  ;  f  dt  >  1. 

0 

If  A  =  1,  then  f  is  a  PDF  and  we  are  finished.  If  A  >  1,  define 


f'  =  f/A,  so  that  I  f'dt  =1.  It  remains  to  show  tliat  f'  uniformly 

0 

approximates  f  on  [0,<»).  From  (3.3.2)  we  have 


f(t)  <  f(t)  +  ee 


Using  (3.3.3) 

-X  t  -X  t 

f(t)  =  f(t)  +  te  °  <  f(t)  +  2te  ° 


Therefore 


A  =  /  f  dt  <  I  f  dt  +  ;  2Ee  °  dt  =  1  +  ■  (3.3.5) 

0  0  0  o 


Now  consider 


f  -  f'l  =  If  -  ^  fl  =  ^  I  Af-f| 


Af  -  f  +  f 


(A-l)f  +  f-f 


,f  +  \  ’.f-f:  <  (A-n  lf|  +  if-fl 

A  A 
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The  last  inequality  follows  from  (3.3.5).  Finally,  from  (3.3.4)  and 
(3.3.5),  we  obtain 

|f-f'|  <  ~  lf|  +  2t  <  ^  ilfll  +  2£.  (3.3.6) 

o  o 


The  second  of  these  inequalities  follows  from  the  boundedness  of  f, 
which  in  turn  is  a  consequence  of  the  continuity  of  f  and  the  fact  that 
f  “*■  0  as  t  00  (see,  for  example.  Boas  [1972],  p.  78).  Since  the  RHS  of 
(3.3.6)  can  be  made  as  small  as  desired  by  an  appropriate  choice  of  £, 
f'  uniformly  approximates  f,  is  nonnegative,  and  integrates  to  unity  and 
therefore  is  a  valid  PDF.  Furthermore, 


-X  t  n  a,  -(X  +X,  )t 
r-i  £  o  ,_k.  ok 
f  =  Te  +  t  — r  e 

k=l 


n  a^  -Xj^t 

"  ^  “a  ® 
k=0  ^  ■ 


(X, 


0) 


whore  a  =  £.  Therefore,  f'  £  GH. 
o 

Q.E.D. 

Let  us  now  consider  the  class  of  PDFs  having  rational  Laplace 

transforms,  where  n  is  the  degree  of  the  denominator  polynomial. 


The  roots 

of  the 

denominator  each  have  negative 

real 

part 

so 

that 

when  a 

partial 

fraction 

expansion  is  formed 

and 

the 

inverse 

transform 

taken , 

there 

are 

at  most  n  terms. 

each 

of 

the 

form 

k  -ot 
t  e  (A 

cos  bt 

+  B 

s  in 

bt).  Therefore,  the 

PDF 

goes 

to 

zero 

exponentially  fast  and  is  continuous.  In  other  words,  all  PDFs  that 
are  in  R^  satisfy  the  conditions  of  Theorem  3.3.1.  We  have  then  the 
following  corollary. 


Corol  lary :  Every  PDF  in  R^^  can  be  uniformly  appro.ximated  on  [0,<»)  by 

a  generalized  hyporexponent ia  1  density.  That  is,  GH  PDFs 

are  dense  in  R  . 

n 
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3.4  Approximating  CPFs  with  E.xponential  Sums 

In  this  subsection  we  wish  to  extend  the  exponential  sum 
approximation  to  cumulative  distribution  functions  (CDFs).  We  begin  by 
showing  that  if  two  PDFs  are  close  in  some  sense,  then  their 
corresponding  CDFs  are  also  close.  It  then  follows  that  any  finite 
mi.xture  of  Erlang  CDFs  can  be  appro.ximated  by  a  generalized 
hyperexponential  CDF.  The  results  of  subsection  3.1  are  then  used  to 
show  that  any  CDF  can  be  closely  approximated  by  a  generalized 
hypere.xponential  CDF. 


Lemma  3.4.1.  Let  f  be  a  PDF  continuous  on  [0,<»).  If  another  PDF,  g, 

t 

uniformly  approximates  f,  then  the  CDF  G  =  /  gC-'i)  dx 

0 

t 

uniformly  approximates  the  CDF  F  =  /  f(x)  dx  on  |0,«>). 

0 

Proof:  For  any  £  >  0  there  exists  a  value  t  such  that  for  c  S  t  . 

-  o  o 

t  . 

F(t)  >  1-  -  This  follows  from  the  existence  of  the  integral 

oe 

I  f(x)  d.x  =  F(«»)  =  1  by  the  Cauchy  criterion  (see,  for  example,  Bartle 
0 

[1964],  p.  345).  Let  g  be  such  that  |f  -  g|  S  z/Zt^  for  all  t  e  [0,<») 

where,  for  the  moment,  we  assume  t  ^0.  We  now  examine  |F  -  Gj  on  the 

o 

intervals  f0,t  ]  and  ft  ,■»). 

‘  o  ‘  o 


(i)  [O.t^l 

t  t  t 

I F  -  G I  =  I  /  f  dx  -  /  g  d.x  i  =  I  /  (f  -  g)  d.xj 
0  0  0 

t  t 

<  I  if  -  gi  dx  <  ;  °  if  -  g|  dx  <  I  • 

0  0 
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■  V'  'V 


■  i-  w"  I  u-j  v"  r  •  ,  '  u" 


(ii)  [t^.->) 

From  (.i),  iF(t  )  -  G(t  )i  <  e/2,  so  that  G(t  )  >  F(t  )  -  e/2  >  1 
o  o  o  o 

-  e/2  -  e/2  =  1  -  e.  By  the  monotonicity  of  G  it  follows  that  G(t)  >  G 

(t  )  >  1  -  e  for  all  t  >  t  .  Therefore,  on  { t  ,  «>)  F  -  G  >  1  -  e/2  -  G  _ 

o  o  o  P 

>  -  e/2  since  G(t)  <  1  for  all  t.  Also  F  -  G  <  1  -  G  <  1-(1  -  e)  =  e. 

Therefore,  F  -  G|  <  e.  Combining  the  results  from  (i)  and  (ii)  we  have 
that  I  F  -  G I  <  e  on  (0.<»),  so  that  G  uniformly  approximates  F.  ^ 

The  only  way  that  t^  could  be  zero  is  if  c/2  >  1.  However, 
jF-Gj  <  1f1  +  IGl  ^  l  +  l  =  2ie;  so  again  G  uniformly  approximates 

Q.E.D.  • 

At  tnis  point,  we  pause  to  note  that  we  have  established  the 
desired  denseness  property  of  the  class  GH  with  respect  to  a  subset  of 
CDFs .  In  particular,  if  F  is  an  absolutely  continuous  CDF  on  10,‘»)  and  ^ 

its  derivative  is  continuous  and  has  an  exponentially  decaying  tail, 
then  it  follows  from  Theorem  3.3.1  and  Lemma  3.4.1  that  there  exists  a 

GH  CDF  that  uniformly  approximates  F.  In  other  words,  we  can  find  a  ^ 

G  e  GH  with  the  property  that  |F(t)  -  G(t)i  <  £  for  all  t  e  [0,»<>). 

Continuing  with  our  general  development,  we  note  that  an  Erlang  PDF 
is  defined  on  [0,«)  and  has  a  Laplace  transform  of  the  form  (X/(X  +  s))*^  > 

where  X  is  a  positive  real  number.  Consequently  the  Erlang  PDFs  belong 
to  and,  from  tiie  corollary  to  Theorem  3.3.1,  we  obtain  the  following 
corollary  to  the  preceding  lemma.  -  > 


i 
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Corol  lary :  EvL'ry  Erlang  CDF  can  be  uniformly  approximated  on  [0,“) 

by  a  GH  CDF. 

Recall  that  e'^(.c)  is  the  Erlang  CDF  obtained  by  taking  the  k-fold 
convolution  of  the  e.xponential  CDF  1-e  Let  us  use  the  notation 

t )  to  ’  ('present  a  GH  CDF  that  uniformly  approximates  on  [0,“). 

We  now  use  tlie  result  stated  in  subsection  3.1  to  show  that  any  CDF  on 
[O,*)  can  be  appro.x  imated  arbitrarily  closely  by  a  generalized 
hyper(’\po;ient  ia  1  CDF. 

Theorem  3.4.1  Let  F  be  an  arbitrary  CDF  defined  on  [0,o»).  Then  a 

generalized  hyperexponential  CDF  can  be  found  that 
approximates  F  arbitrarily  closely  in  the  topology  of 
wo<ik  convergence.  In  other  words,  the  set  of  generalized 
hyperexponential  CDFs  is  dense  in  the  set  of  all  CDFs 
def ined  on  (0 ,«) . 

Prop  f :  From  Equation  (.3.1.1)  the  sequence  of  CDFs  defined  by 

F^^=  F(0)  +  Z  |F(|^)  -  F  i^)]  E|^(t)  (3.4.1) 

converges  to  F  at  each  continuity  point  of  F.  By  the  corollary  to  Lemma 
3.4.1,  there  exists  a  GH  distribution  that  uniformly  approximates  e|^  on 
[0,oo),  call  it  g|^.  Therefore 

iE*^  -  G*^  <  E  on  [O,*’).  (3.4.2) 

n  n 
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(3.4.3) 


I 


1 

i 


i 


Let  F(-) 
n 


F( 


k-1 

n 


) 


and  define  H  as 
n 


H  =  F(0)  +  I  . 

n  ■  ,  ,  n  11 

k.=l 


The  e.xistence  of  H  can  be  characterized  as  follows.  Since  G 
n  n 


IS  a  tUf 


it  never  exceeds  unity.  Therefore, 


Z 

k=l 


n 


n 


<  Z 
k=l 


n 


1  -  F(0) 


by  the  definition  of  b 
sequence  of  partial  sums 


k 
n  ‘ 


k  k 

Since  both  G  and  b  are  nonnegative,  the 
n  n  o  > 


K 

Z 

k=l 


n 


is  bounded  above  and  monotonical ly  increasing  with  K,  and  so  it  has  a  limit. 
At  each  continuity  point  t  of  F  we  have  that  lim  F  (t)  =  F(t).  That  is, 

n-'-oc 

for  £  >  0  there  exists  an  N(E,t)  such  that  for  all  n  >  N,  |F^(t)  -  F(t)! 

S  £.  \ve  are  now  ready  to  show  that  H^(t)  approximates  F(t). 


|H^(t)  -  F(t)|  =  iH^(t)  -  F^(t)  +  F^(t)  -  F(t)| 

<  |H^(t)  -  F^^(t)|  +  |F^(t)  -  F(t)| 

<  iH^(t)  -  F^(t)!  +  E.  (3.4.4) 

From  Equations  (3.4.1)  and  (3.4.3), 

iH^(t)  -  F^(t)|  =  I  (g[^  (t)  -  (t))  I 

k=l 

<  Z  b*^  IG*^  (t)  -  (t)  I  . 

,  ,  n  n  n 


By  Inequality  (3.4.2),  this  becomes 

”  k- 

iH  (t)  -  F  (t)|  <  E  I  <  e. 
n  n 

Substituting  in  (3.4.4)  yields 

|H^(t)  -  F(t)|  <  2£,  n>N(E.t).  (3.4.5) 

Since  e  is  arbitrary,  for  every  value  of  t  the  sequence  of  CDFs 

approximates  F  as  closely  as  desired.  Each  approximant ,  H^(t),  where  n 

depends  upon  t  and  £,  consists  of  an  infinite  sum  of  GH  CDFs.  We  now 

show  that  the  infinite  sum  may  be  replaced  by  a  finite  sum. 

It  follows  from  the  definition  of  b  that  there  e.xists  a  number 

n 

K  (n)  such  that  for  all  K  >  K  (n) , 

I  b*"  <  -  • 

k=K  "  ■  " 

Now  define 

k' ’  r  1  ^  k  k  “  k 

H*'  =  F(0)  +  I  G  (t)  +  I  b  .  (3.4  6) 

n  n  n  ...  n 

k=K"(n) 

Ne.xt,  consider  the  sequence  of  functions  (H^  For  each  e  >  0, 

there  exists  N(E,t)  such  that  for  all  n  >  N,  |H^(t)  -  F(t)l  S  e  by 
(3.4.5).  Now  choose  n  (E,t)  =  max  (N,1/e).  Therefore,  for  all  n  >  n 
we  have 


IhJ)  ^"^t)|  =  \H^  ^'‘\t)  -H^(L)+H^(t)  -FU)! 

<  ihJJ  ^"'(t)  -  H^^(t)l  +  |H^^(t)  -  F(t)|  13.4.7^ 

<  illjj  ^"’(t)  -  H^^U)|  +  E. 


.  • 
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The  last  inequality  holds  since  n  >  n*  5  N.  Now  from  (3.4.3)  and 
(3.4.6), 


(n) 


(t)  -  H/t)| 


1  £  G^(t)  -  £ 

k=K*(n)  ”  "  k=K*(n)  ^ 


I  £  b*" 

k=K*(n)  ” 


(G^(t)  -1)  I 


<  £ 
k=K*(n) 


n  n 


<  £ 


since  |G^(t)  -  1|  <  1  and  n  S  n*  >  1/e.  Substituting  (3.4.8)  into 

(3.4.7)  yields 


Ih" 

n 

(n) 

(t)  - 

F(t)l  <  e 

+  e  =  2e  ,  n 

>  n*. 

(3.4.9) 

By 

the 

way 

j^K  (n) 
n 

was  constructed,  it 

is 

a 

CDF  and 

(3.4.9) 

establishes 

that 

^hK^h) 

}  converges 

weakly 

to 

F. 

Each  H 

n 

contains 

a 

finite  linear 

combination 

of  CDFs 

each 

of 

which  is 

GH.  In 

K  ( n  ^ 

the  event  that  F(0)  =  0,  is  a  (finite)  convex  combination  of 

these  GH  CDFs  and  so  is  itself  GH.  When  F(0)  >  0,  we  can  write 

n 

as  the  mixture 


^"^(t)  =  U(t)  +  P2 

where 

ee 

p  =  F(0)  +  £  .  p,  = 

k=K*  ” 


K"(n)-1 

£ 

k=l 


b*^  G^^ 
n  n 


K'’(n)-1 

(t)/  2 

ri 


1 

j 


K  - 
£ 

k=l 


n 
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and  U(t)  =  1  is  the  CDF  of  an  atom  at  t  =  0.  From  the  definition  of  the 
{b^},  Pj  +  =  1.  If  the  atom  at  t  =  0  is  thought  of  as  an  exponential 

distribution  with  vanishingly  small  mean,  h|^  can  be  viewed  as  a  GH 
CDF  for  any  value  of  F(0). 

To  recapitulate,  we  have  demonstrated  the  existence  of  a  sequence 
of  GH  CDFs,  that  converges  to  a  given  CDF,  F,  at  each  of  its 


continuity  points. 


Q.E.D. 


If  the  limiting  CDF  is  continuous,  then  weak  convergence  becomes 
pointwise  convergence.  A  result  due  to  Polya,  cited  on  p.  86  of  Cluing 
[1974],  establishes  that  the  convergence  is  in  fact  uniform  in  this 
case.  Therefore,  any  continuous  CDF  with  support  on  the  nonnegative 
real  line  can  be  uniformly  appro.ximatcd  by  GH  CDFs. 
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CONCLUDING  REMARKS 


4 . 

We  have  made  a  case  for  considering  generalized  exponential 
mixtures  to  approximate  any  CDF  defined  on  [0,«>)  by  demonstrating  that 
the  class  GH  is  dense  in  the  class  of  all  CDFs,  i.e.,  any  CDF  can  be 
approximated  as  closely  as  desired  by  a  member  of  GH.  Therefore,  GH 
joins  other  known  dense  classes  of  probability  distributions  such  as 
those  of  phase-type  and  those  having  rational  Laplace  transforms.  In 
addition  to  the  denseness  property,  GH  distributions  have  a  unique 
representation;  this  property  is  not  shared  by  all  dense  classes  of 
distributions.  We  also  presented  a  set  of  relations  positioning  the  GH 
class  among  other  often  used  classes  of  distribution  functions.  The 
properties  of  the  GH  class  of  distributions  make  it  attractive  for  both 
numerical  and  statistical  computations. 

This  work  has  focused  on  theoretical  results  and  does  not  discuss 
the  important  area  of  how  to  construct  an  approximating  GH  distribution. 
Recent  work,  however,  has  extended  to  generalized  e.xponential  mixtures  a 
ma.ximum  likelihood-based  algorithm  for  fitting  mi.xed  Weibull 
distributions  to  empirical  data.  Questions  that  remain  for  future 
investigation  include  determining  the  number  of  terms  required  for  a 
finite  mixture  to  be  "good  enough"  and  the  related  question  of  the 
minimum  achievable  distance  between  a  given  CDF  and  the  class  of  GH 
distributions  having  a  fi.xed  number  of  terms. 
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